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1. INTRODUCTION 

Artificial intelligence (AI) methods and algorithms help to provide solutions in many areas [1]-[4]. 
One of the categories of artificial intelligence is called machine learning. It is a type of statistical learning in 
which each item of the database is described by several characteristics or attributes. Machine learning models have 
been applied in many areas of research [5]-[9]. In contrast, the other category of AI claims deep learning, which is 
also a type of statistical learning but extracts feature from input data. Deep learning helped to create deepfakes. 
The term deepfake picked up its name from an unknown client of the platform Reddit, who passed by the 
name 'deepfakes' (deep learning + fakes) and who shared the first deepfakes by putting accidental big names 
of celebrities into porn clips [10]. Most of the deepfakes follow a method in which the real face of a certain 
person is replaced by a fake image of another person, as shown in the Figure 1. 

Social media platforms are considered one of the platforms most targeted by deepfakes technology 
for easy spreading of rumors, lies and fabricated news. At the same time, 'infopocalypse' makes people trust 
any piece of information as it comes from their social networks that include close family members and 
friends. In fact, most people agree with anything that supports their views and preferences, even if they know 
it is fake. In recent times, high-quality realistic deep counterfeiting manufacturing resources are increasingly 
available as open source for creating deceptive operations. This prompts users with little skills to fitly create 
altered videos in terms of replacing faces, synthesizing speech, and changing expressions [11]. In example is 
fake Queen Elizabeth spoke on TV screens on christmas as part of a "deepfake" speech aired by Channel 4 in 
the U.K. There would a few reasons with accept deepfake disinformation could bring an impeding societal 
impact, which will be the reason examining impacts from claiming deepfake disinformation is worth those 
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experimental investigations. Firstly, deepfakes can make sensible disinformation. Automatically created 
sounds and images can be accepted as original images and sounds. A conventional resident might battle until 
recognize way from fiction. secondly, it is possible to use deepfake to increase malinformation by, for 
example, posting a fabricated video of Queen Elizabeth II's speech on any official occasion. Third, deepfakes 
might additionally make an manifestation about productive disinformation. Whether a political performer 
need enough preparation data, the on-screen character could make large portions different, sensible deepfakes 
of the same representative in a short time of time. On mix for political micro targeting (PMT) techniques, 
deepfakes have a chance to be particularly impactful. We are not there yet. Deepfakes don't yet surge people 
in a general sphere, let micro targeted deepfakes. Be that (micro targeted) deepfakes bring those aspects that 
make them conceivably precise capable modes for disinformation in the close to future [12]. From above, we 
note how dangerous this technology is and its impact on society, so it is very important to shed light on this 
technology and how to develop techniques in fingerprints, forensic science and verification techniques to 
detect fake video from the original. There are currently some methods used to detect fake video using deep 
learning techniques or traditional forensic tools, which will be mentioned in this survey, but in general, the 
tools that detect deep fakes remain in their early stages. So that, our contribution here is to see how the 
performance of the deepfake detector will increase if we are combined both concepts (deep learning + 
forensic tools) in the same model. Organization of the sections of this article is: section 2 shows how 
deepfakes is created methods for detecting deepfake in section 3. Finally, section 4 shows discussion and 
conclusion. 


Figure 1. Example of deepfakes [13] 


2. CREATION OF DEEPFAKES 

Deepfakes technology can be used as a synonym for any video or image that has been manipulated 
with the use of deep neural networks. This manipulation is not simple to perform on standard computers but 
requires high-end desktops with strong graphical cards or best still with computing capacity in the cloud. 
This minimizes the processing time required to train deep networks responsible for creating deepfakes. 
Deepfakes in terms of facial manipulation can be classified in the following categories [14]: 


2.1. Face swap 

Recently, face swap is the most common category of face manipulation. This is accomplished using 
one type of deep neural networks. This type is autoencoder, which is used in feature extraction and image 
compression [15]—[17]. Autoencoder (encoder + decoder) structure is used by Reddit user as the first trail of 
deepfake creation [18], [19]. The basic idea behind autoencoder is to represent the input data into a smaller, 
more compressed representation and then the ability to retrieve the original data from this medium 
representation. The process of creating deepfakes can be shown in Figure 2. Generally, this process requires 
training two autoencoders, each work on a set of video clips of one person from the two persons whose 
identities will exchange. After the autoencoders are trained, the target video is given to the wrong decoder to 
produce a deepfake face [20]. In general, deepfakes created using autoencoder make the swapped face look 
like the target face and the source face without paying much attention to the difference between the identity 
of both the source and the target faces. To produce more deep swapped faces, Yang et al. [21] combined 
Cross-identity adversarial in training. 


2.2. Face synthesis 

Generative adversarial networks (GANs) are used in this category to create non-existent real faces. 
The emergence of GANs helped to produce surprisingly realistic results that lead to the birth of deepfakes 
[22]. The most popular way is STYLEGAN (a special type of GANs ) was used to output the seeming “this 
person does not exist” website [23]. Researches improve the capability of style GAN architecture and suggest 
a new a version — StyleGAN2 [24]. 
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Figure 2. Deepfake creation process. Each autoencoder is trained on a set of images of one person 


2.3. Facial attributes and expression 

Deepfakes images can be generated by performing facial expression and attributers manipulation 
such as modifying the skin color, the gender, the age, and the face expression. To perform this image to 
image translation GANs also used. The best method applied for this purpose is STARGAN (a special type of 
GANs) [25]. For facial expression manipulation, Bodur et al. [26] proposed an end to end deep network. 
They use two generative adversarial networks (RGB network + depth network) to translate the source RGB 
image to a given target label. 


3. DEEPFAKE DETECTION 
3.1. Deep learning based methods 

As a result of the effect of deepfake technology on many areas, research papers in the last two years 
have become directed towards this technology and explain all the challenges and techniques related to it [27]. 
Usually, images resulting from the application of deepfake algorithms often need more transformation to fit 
the area to be forged in the source video. Such transformations leave distinct defects. Here comes the role of 
deep learning networks to detect these defects. Li and Lyu [28] trained four models of convolution neural 
network (CNN): residual networks (ResNet152, ResNet10, ResNet50) [29], and VGG16 (stands for visual 
geometry group) [30]. The result CNN model architecture strongly distinguish real videos from tampered 
ones. In [31] a generalized CNN model is developed to increase its capability to detect a different 
manipulation but related to the origin. To reduce the number of parameters required for traditional CNN, 
capsule networks are used in [32] to build a light weight detection system. 

Instead of using visual artifacts within a frame, the temporal features also can be used in deepfake 
detection. The discrepancies among video frames can be discovered using recurrent neural network (RNN) 
[33]. Analysis of two steps are proposed by [34] consist of extracting frame features using CNN followed by 
recognizing temporal conflict between frames due to the process of face swapping as shown in Figure 3. 


Detection 


——» LSM |- Network {Pristine, Deepfake} 
Input Feature Sequence 
sequence vector descriptor 


Figure 3. Firstly, CNN is used for extracting frame features and then fed to long short-term memory (LSTM) 
for analyzing them 
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Eye blinking also can provide a good point in detecting fake videos, because this rate in the fake 
videos may be fewer or higher than the eye blinking rate in real ones. Li et al. [35] use a combination of 
CNN model and recursive neural network to produce long term recurrent CNN (LRCN) [36], for detecting 
the motion of eyelid. Spatial-temporal will be also utilized within data analysis. The point when information 
is gathered over both space also time. Nguyen et al. [37] could take spatio-temporal characteristics from a 
contiguous frame sequence and learn them using a 3-D CNN network model to achieve over 99% as 
deepfake detection accuracy. Dynamic prototype network (DPNet) a powerful result that uses dynamic 
representations. (i.e., prototypes) will clarify deepfake temporal artifacts [38]. Specifically, DPNet's job 
includes focusing on temporal inconsistencies by learning its prototyping representations within the latent 
space, and then build predictions depending on how a test video dynamics are similar to the learned dynamic 
prototypes. Two-stream convolutional neural network (TwoStreamNet) proposed in [39]. The two networks 
recognize frequency and spatial domain artifacts separately, and then the output is fed to the end of the 
network for classification. Table 1 shows the summary of the above methods with their results. Finally 
detecting fake news using deep learning also has a stake in new research [40]-[43]. 


Table 1. Summary of most methods used for deepfake detection using deep learning 


Author DNN model Basic idea Dataset used 
Li and Lyu CNN Four CNN models are used to detect artifacts -UADFV from [27] consists of 49 fake 
[28] between the face area in fake video and its videos and 49 real videos 
surrounding regions. respectively. 
-TIMIT deepfake video dataset [44] 
having two sets of fake videos with 
different resolutions and equality. 
Cozzolino CNN Achieve such generalization by developing CNN Face Forensics data set [45]. 
et al. [31] model. 
Giieraand | CNN+RNN -CNN network for frame level feature extraction. Dataset having approximately 300 videos 
Delpb [34] -The features extracted in step 1 are used for training collected from different websites plus 300 
RNN. videos are chosen randomly from HOHA 
dataset [46]. 
Li et al. CNN+RNN -Crop eye area from frame sequences. eye blinking video (EBV) dataset. 
[35] -Passing the cropped eye area sequences to LRCN 
(which includes three parts: feature extraction using 
CNN, sequence learning, and finally state prediction 
stage) 
Nguyen Capsule - Firstly preprocessing step is applied to the input Face Forensics database [45]. 
et al. [32] Forensics image and then passes to a part of the VGG19 
network [47]. Before entering to capsule network 
VGG19 network pre-trained on ILSVRC database 
[48]. Finally, post-processing step, which works in 
agreement with the pre-processing one. 
Trinh et al. Dynamic -Focusing on temporal inconsistencies by learning its - Face Forensics++ [45] for training. 
[38] Prototype prototyping representations within the latent space, -For testing four data set are used :1) Face 
Network -and then build predictions depending on how a test Forensics++ [45]; 
(DPNet) video dynamics are similar to the learned dynamic 2) DeepfakeDetection [49]; 
prototypes. 3) DeeperForensics-1.0 [50]; 
4) Celeb-DF [51]. 
Yousaf et Two-stream The two network streams take frequency and spatial -For training, a data set of fake images 
al. [39] CNN domain artifacts separately, and their outputs are fed generated by ProGAN [52] is used. 


to the end of the network for classification. 


-For testing a data set of fake images 
generated by other GANs. 


3.2. Multimedia forensics based methods 

This section reviews some ideas of the latest research in multimedia forensics. Detecting whether 
the multimedia content (video, audio, or image) is fake or not, different methods used to expose defects or 
anomalies in the multimedia content [53]—[55]. One of these exploitable anomalies is manufacturing defects, 
in which sensor elements deviate from their expected behavior. These deviations form a pattern that is 
likened to noise called photo-response non-uniformity (PRNU) noise. It is often known as the fingerprint of 
the digital image [56]. Rodriguez et al. [57] used this noise pattern in detecting the deepfake. Pu et al. [58] 
proposed a method for deepfake detection called it noise scope which consists of four main parts: noise 
extractor, fingerprint extractor, fingerprint classifier, and finally fake image detector. This method detects 
deepfake in a blind way i.e. the detector has no information about the generative model used and access only 
to real data. The accuracy of this method reaches about 99.68% in detecting GAN images. A new approach to 
detect deepfake is proposed by [59] based on the “JPEG ghost” algorithm. This algorithm recognizes 
tampered faces from real ones by analyzing incompatible compression errors. 
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3.3. Multimedia forensics and deep learning based methods 

The Deepfake technique having a detrimental effect on people and society if it not controlled in 
time. It is a kind of exploitation of artificial intelligence methods and machine learning technology for 
unlawful purposes, such as incitement against a certain party or a public figure by creating the illusion that a 
person said something, and in fact, it is not. These changes are not noticeable to the eye of the recipient. 
Forces him to believe in her [13]. Before the occurrence of deep learning methods, forensic tools have long 
been considered to automatically detect tampering at physical, digital, and semantic levels [60]. Recently, 
deep learning has become of great interest by researchers, as it has the ability to learn features directly from 
the data [53]. 

Detection methods are still in their early stages and many challenges they face. So why do we not 
exploit the advantages of multimedia forensic methods and deep learning methods to get algorithms more 
robust in terms of detecting deepfakes?. For example, detecting different ratios of image compression using 
error level analysis followed by using the CNN model increases the detection accuracy [61]. This is because 
images with JPEG formats have the same compression level, so manipulation applied to that image will 
disturb the pressure levels between the modified area and the surrounding areas. Also, Habeeba et al. [62] 
proposed a two-step verification method, that used a three-layer neural network (NN) for fake video 
classifying and followed by a second confirmation step which includes a comparison of the laplacian 
variance for different patches in the face. They achieve good results in terms of accuracy and computational 
requirements. 


4. CONCLUSION 

In general, the quality of deepfakes technology is greatly increasing, so it is necessary to pay 
attention to methods of detecting deepfakes and increasing their strength. In this survey, we focused on 
showing the deepfakes environment. The methods used to detect deepfakes were also explained from two 
viewpoints (deep learning + multimedia forensics). we provided a brief view of these two concepts and how 
combining them increases the deepfakes detection accuracy. Finally, we hope this information is useful to the 
community in understanding and preventing malignant deepfakes. 
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